When starting off with deep learning, one of the first questions to ask is, which framework to learn?
Common choices include Theano, TensorFlow, Torch, and Keras. All of these choices have their own pros and cons and have their own way of doing things.
From The Anatomy of Deep Learning Frameworks
The core components of a deep learning framework we must consider are:
- How Tensor Objects are defined. At the heart of the framework is the tensor object. A tensor is a generalization of a matrix to n-dimensions. We need a Tensor Object that supports storing the data in form of tensors. Not just that, we would like the object to be able to convert other data types (images, text, video) into tensors and back, supporting indexing, overloading operators, having a space efficient way to store the data and so on.
- How Operations on the Tensor Object are defined. A neural network can be considered as a series of Operations performed on an input tensor to give an output.
- The use of a Computation Graph and its Optimizations. Instead of implementing operations as functions, they are usually implemented as classes. This allows us to store more information about the operation like calculated shape of the output (useful for sanity checks), how to compute the gradient or the gradient itself (for the auto-differentiation), have ways to be able to decide whether to compute the op on GPU or CPU and so on. The power of neural networks lies in the ability to chain multiple operations to form a powerful approximator. Therefore, the standard use case is that you can initialize a tensor, perform actions after actions on them and finally interpret the resulting tensor as labels or real values. Unfortunately, as you chain more and more operations together, several issues arise that can drastically slow down your code and introduce bugs as well. There are more such issues and it becomes necessary to be able to get a bigger picture to even notice that these issues exist. We need a way to optimize the resultant chain of operations for both space and time. A Computation Graph which is basically an object that contains links to the instances of various Ops and the relations between which operation takes the output of which operation as well as additional information.
- The use of Auto-differentiation tools. Another benefit of having the computational graph is that calculating gradients used in the learning phase becomes modular and straightforward to compute.
- The use of BLAS/cuBLAS and cuDNN extensions for maximizing performance. BLAS or Basic Linear Algebra Subprograms are a collection of optimized matrix operations, initially written in Fortran. These can be leveraged to do very fast matrix (tensor) operations and can provide significant speedups. There are many other software packages like Intel MKL, ATLAS which also perform similar functions. BLAS packages are usually optimized assuming that the instructions will be run on a CPU. In the deep learning situation, this is not the case and BLAS may not be able to fully exploit the parallelism offered by GPGPUs. To solve this issue, NVIDIA has released cuBLAS which is optimized for GPUs. This is now included with the CUDA toolkit.
The computational model for Tensorflow (tf
) is a directed graph.
Nodes are functions (operations in tf
terminology) and edges are tensors.
Tensor are multidimensional data arrays.
$$f(a,b) = (a*b) + (a+b)$$There are several reasons for this design:
tf
uses automatic differentiation to automatically compute the derivative of every node with respect any other node that can affect the first node's output.The primary API of tf
(written in C++) is accessed through Python.
There are different way of installing tf
:
tf
versions.Our preferred way is Docker install.
In [ ]:
import tensorflow as tf
print(tf.__version__)
In [ ]:
# Basic constant operations = to assign a value to a tensor
a = tf.constant(2)
b = tf.constant(3)
c = a+b
d = a*b
e = c+d
# non interactive session
with tf.Session() as sess:
print("a=2")
print("b=3")
print("(a+b)+(a*b) = %i" % sess.run(e))
You can create initialized tensors in many ways:
In [ ]:
a = tf.zeros([2,3], tf.int32)
b = tf.ones([2,3], tf.int32)
c = tf.fill([3,3], 23.9)
d = tf.range(0,10,1)
with tf.Session() as sess:
print(sess.run(a))
print(sess.run(b))
print(sess.run(c))
print(sess.run(d))
tf
sequences are not iterable!
We can also generate random variables:
In [ ]:
a = tf.random_normal([2,2], 0.0, 1.0)
b = tf.random_uniform([2,2], 0.0, 1.0)
with tf.Session() as sess:
print(sess.run(a))
print(sess.run(b))
How to generate random shuffled number in tensorflow?
In [ ]:
idx = tf.constant(20)
idx_list = tf.range(idx) # 0~19
shuffle = tf.random_shuffle(idx_list)
with tf.Session() as sess:
a, b = sess.run([idx_list, shuffle])
print a
print b
In [ ]:
# Basic operations with variable graph input
a = tf.placeholder(tf.int16)
b = tf.placeholder(tf.int16)
c = tf.add(a,b)
d = tf.mul(a,b)
e = tf.add(c,d)
values = feed_dict={a: 5, b: 3}
# non interactive session
with tf.Session() as sess:
print('a = %i' % sess.run(a, values))
print('b = %i' % sess.run(b, values))
print("(a+b)+(a*b) = %i" % sess.run(e, values))
A computational graph is a series of functions chained together, each passing its output to zero, one or more functions further along the chain.
In this way we can construct very complex transformations on data by using a library of simple functions.
Nodes represent some sort of computation beign done in the graph context.
Edges are the actual values (tensors) that get passed to and from nodes.
Values running on edges are tensors:
In [ ]:
# Basic operations with variable as graph input
a = tf.placeholder(tf.int16,shape=[2])
b = tf.placeholder(tf.int16,shape=[2])
c = tf.add(a,b)
d = tf.mul(a,b)
e = tf.add(c,d)
variables = feed_dict={a: [2,2], b: [3,3]}
# non interactive session
with tf.Session() as sess:
print(sess.run(a, variables))
print(sess.run(b, variables))
print(sess.run(e, variables))
In [ ]:
# your code here
There are certein connections between nodes that are not allowed: you cannot create circular dependencies.
Dependency: Any node A that is required for the computation of a later node B is said to be a dependency of B.
The main reason is that dependencies create endless feedback loops.
There is one exception to this rule: recurrent neural networks. In this case tf
simulate this kind of dependences by copying a finite number of versions of the graph, placing them side-by-side, and feeding them into another sequence. This process is referred as unrolling the graph.
Keeping track of dependencies is a basic feature of tf
. Let's suppose that we want to compute the output value of the mul
node. We can see in the unrolled graph that is not necessary to compute the full graph to get the output of that node. But how to ensure that we only compute the necessary nodes?
It's pretty easy:
The stack will be ordered in a way that we are guaranteed to be able to run each node in the stack as we iterate through it.
The main thing to look out for is to keep track of nodes that were already computed and to store their value in memory.
As we have seen in previous code, tf
workflow is a two-step process:
In [ ]:
# graph definition
# we can assign a name to every node
a = tf.placeholder(tf.int32, name='input_a')
b = tf.placeholder(tf.int32, name='input_b')
c = tf.add(a,b,name='add_1')
d = tf.mul(a,b,name='mul_1')
e = tf.add(c,d,name='add_2')
values = feed_dict={a: 5, b: 3}
# now we can run the graph in an interactive session
sess = tf.Session()
print(sess.run(e, values))
sess.close()
tf
has a very useful tool: tensor-board
. Let's see how to use it.
In [ ]:
# cleaning the tf graph space
tf.reset_default_graph()
a = tf.placeholder(tf.int16, name='input_a')
b = tf.placeholder(tf.int16, name='input_b')
c = tf.add(a,b,name='add_1')
d = tf.mul(a,b,name='mul_1')
e = tf.add(c,d,name='add_2')
values = feed_dict={a: 5, b: 3}
# now we can run the graph
# graphs are run by invoking Session objects
session = tf.Session()
# when you are passing an operation to 'run' you are
# asking to run all operations necessary to compute that node
# you can save the value of the node in a Python var
output = session.run(e, values)
print(output)
# now let's visualize the graph
# SummaryWriter is an object where we can save information
# about the execution of the computational graph
writer = tf.train.SummaryWriter('my_graph', session.graph)
writer.close()
# closing interactive session
session.close()
print output
Open a terminal and type in:
tensorboard --logdir="my_graph"
This starts a tensorboard
server on port 6006. There, click on the Graphs
link. You can see that each of the nodes is labeled based on the name
parameter you passed into each operation.
Implement and visualize this graph for a constant tensor [5,3]
:
Check these functions in the tf
official documentation (https://www.tensorflow.org/): tf.reduce_prod
, tf.reduce_sum
.
In [ ]:
# your code here
tf
input datatf
can take several Python var types that are automatically converted to tensors:
tf.constant([5,3], name='input_a')
But tf
has a plethora of other data types: tf_int16
, tf_quint8
, etc.
tf
is tightly integrated with NumPy. In fact, tf
data types are based on those from NumPy. Tensors returned from Session.run
are NumPy arrays. NumPy arrays is the recommended way of specifying tensors.
The shape
of tensors describe both the number of dimensions in a tensor as well as the length of each dimension. In addition to to being able to specify fixed lengths to each dimension, you can also assign a flexible length by passing in None
as dimension's value.
In [ ]:
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
a = tf.placeholder(tf.int16, shape=[2,2], name='input_a')
shape = tf.shape(a)
session = tf.Session()
print(session.run(shape))
session.close()
We can feed data points to placeholder by iterating through the data set:
In [ ]:
tf.reset_default_graph()
list_a_values = [1,2,3]
a = tf.placeholder(tf.int16)
b = a * 2
with tf.Session() as sess:
for a_value in list_a_values:
print(sess.run(b,{a: a_value}))
In [ ]:
import tensorflow as tf
tf.reset_default_graph()
a = tf.placeholder(tf.int16)
b = tf.placeholder(tf.int16)
c = a+b
d = a*b
e = c+d
variables = feed_dict={a: 5, b: 3}
with tf.Session() as sess:
print("(a+b)+(a*b) = %i" % sess.run(e, variables))
There are more Tensorflow Operations
Once the graph is initialized we can attach operation to it by using the Graph.as_default()
method:
with g.as_default():
a = tf.mul(2,3)
...
tf
automatically creates a graph at the beginning and assigns it to be the default. Thus, if not using Graph.as_default()
any operation will be automatically placed in the default graph.
Creating multiple graphs can be useful if you are defining multiple models that do not have interdependencies:
g1 = tf.Graph()
g2 = tf.Graph()
with g1.as_default():
...
with g2.as_default():
...
Variables can be used anywhere you might use a tensor.
tf
has a number of helper operations to initialize variables: tf-zeros()
, tf_ones()
, tf.random_uniform()
, tf.random_normal()
, etc.
Variable
objects live in a Graph
but their state is managed by Session
. Because of these they need an extra step for inicialization:
import tensorflow as tf
tf.reset_default_graph()
a = tf.Variable(3,name="my_var")
b = tf.add(5,a)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
...
In order to chage the value of a Variable
we can use the Variable.assign()
method:
In [ ]:
import tensorflow as tf
tf.reset_default_graph()
a = tf.Variable(3,name="my_var")
b = a.assign(tf.mul(2,a)) # variables are objects, not ops.
# The statement a.assign(...) does not actually assign any to a,
# but rather creates a tf.Operation that you have to explicitly
# run to update the variable.
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
print "a:",a.eval() # variables are objects, not ops.
print "b:", sess.run(b)
print "b:", sess.run(b)
print "b:", sess.run(b)
In [ ]:
tf.reset_default_graph()
a = tf.Variable(3,name="my_var")
b = a.assign(tf.mul(2,a))
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
print a.eval()
In [ ]:
tf.reset_default_graph()
a = tf.Variable(3,name="my_var")
b = a.assign(tf.mul(2,a))
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
sess.run(b)
print a.eval()
We can increment and decrement variables:
In [ ]:
import tensorflow as tf
tf.reset_default_graph()
a = tf.Variable(3,name="my_var")
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
print(sess.run(a.assign_add(1)))
print(sess.run(a.assign_sub(1)))
print(sess.run(a.assign_sub(1)))
sess.run(tf.initialize_all_variables())
print(sess.run(a))
Some classes of tf
(f.e. Optimizer
) are able to automatically change variable values without explicitely asking to do so.
Tensorflow sessions maintain values separately, each Session can have its own current value for a variable defined in the graph:
In [ ]:
tf.reset_default_graph()
a = tf.Variable(10)
sess1 = tf.Session()
sess2 = tf.Session()
sess1.run(tf.initialize_all_variables())
sess2.run(tf.initialize_all_variables())
print sess1.run(a.assign_add(10))
print sess2.run(a.assign_sub(2))
sess1.close()
sess2.close()
In [ ]:
import tensorflow as tf
tf.reset_default_graph()
with tf.name_scope("Scope_A"):
a = tf.add(1, 2, name="A_add")
b = tf.mul(a, 3, name="A_mul")
with tf.name_scope("Scope_B"):
c = tf.add(4, 5, name="B_add")
d = tf.mul(c, 6, name="B_mul")
e = tf.add(b, d, name="output")
writer = tf.train.SummaryWriter('./name_scope_1', graph=tf.get_default_graph())
writer.close()
We can start tensorboard
to see the graph: tensorboard --logdir="./name_scope_1"
.
You can expand the name scope boxes by clicking +
.
Let's built and visualize and complex model:
tensorboard
.
In [ ]:
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
# Explicitly create a Graph object
graph = tf.Graph()
with graph.as_default():
with tf.name_scope("variables"):
# your code here
# Primary transformation Operations
with tf.name_scope("transformation"):
# Separate input layer
with tf.name_scope("input"):
# your code here
# Separate middle layer
with tf.name_scope("intermediate_layer"):
# your code here
# Separate output layer
# your code here
with tf.name_scope("update"):
# Increments the total_output Variable by the latest output
# your code here
# Summary Operations
with tf.name_scope("summaries"):
avg = tf.div(update_total, tf.cast(increment_step, tf.float32), name="average")
# Creates summaries for output node
tf.scalar_summary(b'Output', output, name="output_summary")
tf.scalar_summary(b'Sum of outputs over time', update_total, name="total_summary")
tf.scalar_summary(b'Average of outputs over time', avg, name="average_summary")
# Global Variables and Operations
with tf.name_scope("global_ops"):
# Initialization Op
init = tf.initialize_all_variables()
# Merge all summaries into one Operation
merged_summaries = tf.merge_all_summaries()
# Start an interactive Session, using the explicitly created Graph
sess = tf.Session(graph=graph)
# Open a SummaryWriter to save summaries
writer = tf.train.SummaryWriter('./improved_graph', graph)
# Initialize Variables
sess.run(init)
Let's write a function to run the graph several times:
In [ ]:
def run_graph(input_tensor):
"""
Helper function; runs the graph with given input tensor and saves summaries
"""
feed_dict = {a: input_tensor}
out, step, summary = sess.run([output, increment_step, merged_summaries],
feed_dict=feed_dict)
writer.add_summary(summary, global_step=step)
In [ ]:
# Run the graph with various inputs
run_graph([2,8])
run_graph([3,1,3,3])
run_graph([8])
run_graph([1,2,3])
run_graph([11,4])
run_graph([4,1])
run_graph([7,3,1])
run_graph([6,3])
run_graph([0,2])
run_graph([4,5,6])
# Write the summaries to disk
writer.flush()
# Close the SummaryWriter
writer.close()
# Close the session
sess.close()
To start TensorBoard after running this code, run the following command:
tensorboard --logdir='./improved_graph'